Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf/export fixes #1

Closed
wants to merge 43 commits into from
Closed

Perf/export fixes #1

wants to merge 43 commits into from

Conversation

borisfom
Copy link

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

erhoo82 and others added 30 commits February 9, 2023 13:44
* per-micro-batch input loader

* per-micro-batch input loader

set arg default val

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

* apply per-microbatch-loader to only GPT

* update docstring on micro-batch input loader

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed the default arg val

* fix batch size to 1 at log stat registration

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update container for CI

Signed-off-by: ericharper <[email protected]>

* update container in jenkinsfile

Signed-off-by: ericharper <[email protected]>

* update container for CI

Signed-off-by: ericharper <[email protected]>

fix merge conflict

* revert Jenkinsfile

* Revert "revert Jenkinsfile"

This reverts commit d23b775.

* Update nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py

Signed-off-by: Tim Moon <[email protected]>

* add GradScaler

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ericharper <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
* Partial impl of ALSD alignment extraction

Signed-off-by: smajumdar <[email protected]>

* Partial impl of ALSD alignment extraction

Signed-off-by: smajumdar <[email protected]>

* Remove everything else

Signed-off-by: smajumdar <[email protected]>

* Support dataclass in AbstractRNNTDecoding

Signed-off-by: smajumdar <[email protected]>

* Add first draft unittest

Signed-off-by: smajumdar <[email protected]>

* Correct the logic to more to the next timestep in the alignment

Signed-off-by: smajumdar <[email protected]>

* Finalize ALSD alignment generation

Signed-off-by: smajumdar <[email protected]>

* Add support for TSD greedy alignment extraction

Signed-off-by: smajumdar <[email protected]>

* Add support for mAES greedy alignment extraction

Signed-off-by: smajumdar <[email protected]>

* Finalize extraction of alignments from all beam algorithms for RNNT

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Add copyright

Signed-off-by: smajumdar <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
* Base code for AWS SageMaker example

Signed-off-by: SeanNaren <[email protected]>

* Remove format

Signed-off-by: SeanNaren <[email protected]>

* wrap

Signed-off-by: SeanNaren <[email protected]>

* Add a notebook with the code

Signed-off-by: SeanNaren <[email protected]>

* Setup

Signed-off-by: SeanNaren <[email protected]>

* Update notebook

Signed-off-by: SeanNaren <[email protected]>

* Remove space

Signed-off-by: SeanNaren <[email protected]>

* Fix spelling mistake

Signed-off-by: SeanNaren <[email protected]>

* Add message to explain usage

Signed-off-by: SeanNaren <[email protected]>

* Add CommonVoice esperanto example

Signed-off-by: SeanNaren <[email protected]>

* Fix path

Signed-off-by: SeanNaren <[email protected]>

* Fixes

Signed-off-by: SeanNaren <[email protected]>

* Import sox locally, add documentation

Signed-off-by: SeanNaren <[email protected]>

* Address reviews

Signed-off-by: SeanNaren <[email protected]>

* Address reviews

Signed-off-by: SeanNaren <[email protected]>

* Address reviews

Signed-off-by: SeanNaren <[email protected]>

* Add cell to download the SSL model

Signed-off-by: SeanNaren <[email protected]>

* Set max epochs to 300

Signed-off-by: SeanNaren <[email protected]>

* Fixes, introduce HF dataset instructions

Signed-off-by: SeanNaren <[email protected]>

* Upstream updates from other branch

Signed-off-by: SeanNaren <[email protected]>

* Fix warning

Signed-off-by: SeanNaren <[email protected]>

* Add README, add image

Signed-off-by: SeanNaren <[email protected]>

* Fix warning

Signed-off-by: SeanNaren <[email protected]>

* Address feedback

Signed-off-by: SeanNaren <[email protected]>

* Feedback

Signed-off-by: SeanNaren <[email protected]>

---------

Signed-off-by: SeanNaren <[email protected]>
* Add papers from 2022/2022 to PUBLICATIONS.md

Signed-off-by: smajumdar <[email protected]>

* Remove ipynb from being tracked as for nemo code library

Signed-off-by: smajumdar <[email protected]>

* Remove ipynb from being tracked as for nemo code library

Signed-off-by: smajumdar <[email protected]>

* Add additional papers

Signed-off-by: smajumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
…it tests (#5980) (#5984)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
…ict-0.7b_nv22.10.txt (#5869)

* removed WHATEVER(1)  ˌhwʌˈtɛvɚ

Signed-off-by: MikyasDesta <[email protected]>

* remove WHATEVER(1) and WHATEVER's(1)

Signed-off-by: MikyasDesta <[email protected]>

* removed nv22.10.txt

Signed-off-by: MikyasDesta <[email protected]>

* added updated and removed words to notes

Signed-off-by: MikyasDesta <[email protected]>

* sign off

Signed-off-by: MikyasDesta <[email protected]>

---------

Signed-off-by: MikyasDesta <[email protected]>
Co-authored-by: Mikyas Desta <[email protected]>
* Megatron positional encoding alibi fix (#5808) (#5863)

* 1. Debugging.

* 1. Debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

* 1. Debugging.

* 1. Fixed initialization.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* 1. Debugging.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Debugging.

* 1. Removed scale from ALiBi.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated yaml and added support to control number of alibi heads.

Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Removed num_attention_heads_alibi from configs.

Signed-off-by: Micha Livne <[email protected]>

Signed-off-by: Micha Livne <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <[email protected]>

Signed-off-by: Micha Livne <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fix segmenting for pcla inference (#5849)

* Fix segmenting for pcla inference

Signed-off-by: Matvei Novikov <[email protected]>

* Fix segmenting for pcla inference

Signed-off-by: Matvei Novikov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Matvei Novikov <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <[email protected]>

* indentation fix (#5861) (#5862)

Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>

Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Signed-off-by: Jason <[email protected]>

* add ambernet to readme (#5872) (#5873)

Signed-off-by: fayejf <[email protected]>

Signed-off-by: fayejf <[email protected]>

Signed-off-by: fayejf <[email protected]>
Co-authored-by: fayejf <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fix wrong label mapping in batch_inference for label_model (#5767) (#5870)

* fix batch inference

* add test for batch

* fix device

Signed-off-by: fayejf <[email protected]>
Co-authored-by: fayejf <[email protected]>
Signed-off-by: Jason <[email protected]>

* WAR for https://github.com/pytorch/pytorch/pull/91526

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fix memory allocation of NeMo Multi-speaker Data Simulator (#5864)

* fix data simulator

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* Adding noise_manifest handling for faster speed

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added multi-gpu feature

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added a parameter for noise source file number

Signed-off-by: Taejin Park <[email protected]>

* Fixed noise_manifest error bug

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <[email protected]>

* RETRO model finetuning (#5800)

* add save and load dynmaic index

Signed-off-by: Yi Dong <[email protected]>

* add chunk stride feature

Signed-off-by: Yi Dong <[email protected]>

* add chunk stride feature

Signed-off-by: Yi Dong <[email protected]>

* add no pq index

Signed-off-by: Yi Dong <[email protected]>

* added megatron lm compatible mode

Signed-off-by: Yi Dong <[email protected]>

* addd config

Signed-off-by: Yi Dong <[email protected]>

* fix position embedding

Signed-off-by: Yi Dong <[email protected]>

* added index factory

Signed-off-by: Yi Dong <[email protected]>

* share neighbors and weights amoung strategies

Signed-off-by: Yi Dong <[email protected]>

* fix bug

Signed-off-by: Yi Dong <[email protected]>

* added metric tto faiss index

Signed-off-by: Yi Dong <[email protected]>

* set default to inner product

Signed-off-by: Yi Dong <[email protected]>

* added qa fine tuen dataset

Signed-off-by: Yi Dong <[email protected]>

* added fine tuning code

Signed-off-by: Yi Dong <[email protected]>

* trim it

Signed-off-by: Yi Dong <[email protected]>

* fix data issue

Signed-off-by: Yi Dong <[email protected]>

* fix style

Signed-off-by: Yi Dong <[email protected]>

* added version

Signed-off-by: Yi Dong <[email protected]>

* fix key error

Signed-off-by: Yi Dong <[email protected]>

* make sure to overwrite the cfg

Signed-off-by: Yi Dong <[email protected]>

* make multiple sentence bert available

Signed-off-by: Yi Dong <[email protected]>

* fix the document

Signed-off-by: Yi Dong <[email protected]>

* fix the table

Signed-off-by: Yi Dong <[email protected]>

* fix transformer

Signed-off-by: Yi Dong <[email protected]>

* make sure to turn off the rope in chunked cross attention layer

Signed-off-by: Yi Dong <[email protected]>

* fix the security issue

Signed-off-by: Yi Dong <[email protected]>

* style fix

Signed-off-by: Yi Dong <[email protected]>

* fix codeql issues

Signed-off-by: Yi Dong <[email protected]>

* fix

Signed-off-by: Yi Dong <[email protected]>

* use -1

Signed-off-by: Yi Dong <[email protected]>

* fix empty index

Signed-off-by: Yi Dong <[email protected]>

* clean up

Signed-off-by: Yi Dong <[email protected]>

* fix the lower bound for repetition penalty

Signed-off-by: Yi Dong <[email protected]>

* add retro qa inference strategy

Signed-off-by: Yi Dong <[email protected]>

* added new inference logic

Signed-off-by: Yi Dong <[email protected]>

* working inference

Signed-off-by: Yi Dong <[email protected]>

* fix TP inference

Signed-off-by: Yi Dong <[email protected]>

* revert requirement

Signed-off-by: Yi Dong <[email protected]>

* added file inference

Signed-off-by: Yi Dong <[email protected]>

* use string to prevent collison

Signed-off-by: Yi Dong <[email protected]>

* use NQ test

Signed-off-by: Yi Dong <[email protected]>

* fix prompt

Signed-off-by: Yi Dong <[email protected]>

* fix inference

Signed-off-by: Yi Dong <[email protected]>

* set good defaults for demo

Signed-off-by: Yi Dong <[email protected]>

* replicate adlr

Signed-off-by: Yi Dong <[email protected]>

* make sure to turn off attention reset for megatron lm compatible model

Signed-off-by: Yi Dong <[email protected]>

* style fix

Signed-off-by: Yi Dong <[email protected]>

* fix typo

Signed-off-by: Yi Dong <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix inference error

Signed-off-by: Yi Dong <[email protected]>

* fix logging

Signed-off-by: Yi Dong <[email protected]>

* address comments

Signed-off-by: Yi Dong <[email protected]>

---------

Signed-off-by: Yi Dong <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <[email protected]>

* [TTS] GAN-based spectrogram enhancer (#5565)

* [TTS] add SpectrogramEnhancer based on StyleGAN 2

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] some tests for spectrogram enhancer

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: a tiny clean up

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: log images during training

Signed-off-by: Roman Korostik <[email protected]>

* exp_manager: pass save_on_train_epoch_end to checkpointing callback

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: add training script and config examples

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix comments

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: don't assume FastPitch

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: better input shapes handling

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix porting error

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix logging and .nemo saving

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: clean up scaling

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: formatting

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: update examples

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: shape handling

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: remove LoggerCollection handling

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: copyright notice for tests

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: use process_batch helper

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: return empty list of available models

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: some docs

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: style --fix

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: chan_last -> channel_last

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: remove unused imports

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: remove unused return value

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: losses are nn.Modules now

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: init optimizers from config

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: formatting

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: unused imports

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: typechecking

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: more tests

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix logging images

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: unclutter prepare_batch

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: init generator and discriminator from the config for consistency with other NeMo models

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: update spectrogram range in the example config

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: comment on loss weights in the example config

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: rename Conv2DMod to Conv2DModulated

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: remove unused imports

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix CodeQL import warnings

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: type_as_recursive -> to_device_recursive

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: move to_device_recursive to helpers

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: move losses to a separate module, add comments

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: add optimizers' entries to config

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix test configs

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: support length masking for 3-dim tensors

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: add masking to spectrogram normalization

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix tests

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: add spectrogram normalization tests

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix imports and formatting in tests

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix docstring typo

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: rename G and D to generator and discriminator

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: better argument naming in interfaces (condition -> input_spectograms, target -> target_spectrograms)

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: formatting

Signed-off-by: Roman Korostik <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [TTS] SpectrogramEnhancer: fix import warnings in modules

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] add resynthesize_dataset.py script

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] add PairedRealFakeSpectrogramsDataset

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: update example config to reflect new data setup

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] resynthesize_dataset.py: remove unused imports

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] resynthesize_dataset.py: use nemo manifest handling

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] resynthesize_dataset.py: remove unused import

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] resynthesize_dataset.py: underscores for .npy names

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: remove return value from a test

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] add length masking helper

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: use common tts length mask function

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] unused imports in tts helpers

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: fix an import

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: introduce computed upsample_factor to generator

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: clean up and clarify validation data setup

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: remove a hardcoded path in the example config

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] SpectrogramEnhancer: configurize max_spectrogram_length in generator

Signed-off-by: Roman Korostik <[email protected]>

* [TTS] resynthesize_dataset.py: consistent dashes and underscores in CLI args

Signed-off-by: Roman Korostik <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Roman Korostik <[email protected]>
Signed-off-by: Roman Korostik <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <[email protected]>

* Optimizing distributed Adam when running with one work queue (#5560)

* Dist Adam constructs a single param bucket for each GPT layer

Signed-off-by: Tim Moon <[email protected]>

* Synchronize dist Adam reduce-scatters before launching model-parallel all-reduces

Signed-off-by: Tim Moon <[email protected]>

* Configure per-layer dist Adam buckets for BERT and T5

Signed-off-by: Tim Moon <[email protected]>

* Remove unused variables

Signed-off-by: Tim Moon <[email protected]>

* Configure GPT with one dist Adam bucket per virtual pipeline stage

Signed-off-by: Tim Moon <[email protected]>

* Configure BERT with one dist Adam bucket per virtual pipeline stage

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit in Dockerfile

Need recent updates to Apex distributed Adam optimizer.

Signed-off-by: Tim Moon <[email protected]>

* Remove logic for per-virtual-pipeline distopt buckets from T5

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Jason <[email protected]>

* fix(readme): fix typo (#5883)

Signed-off-by: Jean-Louis Queguiner <[email protected]>
Signed-off-by: Jason <[email protected]>

* TTS inference with Heteronym classification model, hc model inference refactoring (#5768)

* refactor inference, fix span detection

Signed-off-by: ekmb <[email protected]>

* fix merge conflicts

Signed-off-by: ekmb <[email protected]>

* fix merge conflicts

Signed-off-by: ekmb <[email protected]>

* remove unused var

Signed-off-by: ekmb <[email protected]>

* clean up, test update

Signed-off-by: ekmb <[email protected]>

* arg name update

Signed-off-by: ekmb <[email protected]>

* merge wip

Signed-off-by: ekmb <[email protected]>

* revert changes

Signed-off-by: ekmb <[email protected]>

* update docs, move heteronym to baseg2p

Signed-off-by: ekmb <[email protected]>

* change wordid file defaults to none

Signed-off-by: ekmb <[email protected]>

* add manifest check

Signed-off-by: ekmb <[email protected]>

* replace homograph with heteronym, upper case wordid for riva, review feedback

Signed-off-by: ekmb <[email protected]>

* add log message, update comment

Signed-off-by: ekmb <[email protected]>

* rename test manifest field

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Jason <[email protected]>

* take out retro doc (#5885) (#5886)

Signed-off-by: Yi Dong <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Signed-off-by: Jason <[email protected]>

* Add option to disable distributed parameters in distributed Adam optimizer (#5685)

* Add option to run dist Adam without distributed params

Similar to DDP, but leverages dist Adam's support for overlapping communication with backward compute

Signed-off-by: Tim Moon <[email protected]>

* Fix bug in grad clipping when dist Adam has redundant params

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Signed-off-by: Jason <[email protected]>

* [ASR] Separate Audio-to-Text (BPE, Char) dataset construction (#5774)

* Separate full BPE dataset construction

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix the case when the dataset is None

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix comment

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix typos

Signed-off-by: Vladimir Bataev <[email protected]>

* Separate char dataset construction. Fix DALI dataset usage.

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jason <[email protected]>

* transformer duration added and IPA config files added

Signed-off-by: Jason <[email protected]>

* inference issue for pace resolved

Signed-off-by: Jason <[email protected]>

* Latest ONNX develpoments

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Remove MCD_DTW tarball (#5889)

Signed-off-by: Jocelyn Huang <[email protected]>
Signed-off-by: Jason <[email protected]>

* Block large files from being merged into NeMo main (#5898)

* Attempt to use large-file pre-commit ci hook

Signed-off-by: SeanNaren <[email protected]>

* Set defaults and enforce

Signed-off-by: SeanNaren <[email protected]>

* Set to 1000

Signed-off-by: SeanNaren <[email protected]>

* Remove enforcement

Signed-off-by: SeanNaren <[email protected]>

---------

Signed-off-by: SeanNaren <[email protected]>
Signed-off-by: Jason <[email protected]>

* Reduce memory usage in getMultiScaleCosAffinityMatrix function (#5876)

* Updated offline_clustering.py, the getMultiScaleCosAffinityMatrix function, reduced memory usage

Signed-off-by: gabitza-tech <[email protected]>

* torch.empty.cache() outside forward_infer()

Signed-off-by: Taejin Park <[email protected]>

* Removed unnecessary lines

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Speed up for non torch.jit.script

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* parallelism is default off

Signed-off-by: Taejin Park <[email protected]>

* nme_mat_size is unified as 512, removing redundant docstring

Signed-off-by: Taejin Park <[email protected]>

---------

Signed-off-by: gabitza-tech <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <[email protected]>

* set max_steps for lr decay through config (#5780)

* set max_steps for lr decay through config

* added warning for optim sched max_steps config option

* reverted changes to modelPT and updated megatron_base_model

* added the experimental cosine annealing scheduler class

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update decay_steps for consine annealing exp class

* added copyright

---------

Co-authored-by: ANMOL GUPTA <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fix transducer and question answering tutorial bugs bugs (#5809) (#5810)

Co-authored-by: Zhilin Wang <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Jason <[email protected]>

* update apex install instructions (#5901) (#5902)

Signed-off-by: ericharper <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Jason <[email protected]>

* Hybrid ASR-TTS models (#5659)

Add hybrid ASR-TTS models and text-to-text dataset

Signed-off-by: Vladimir Bataev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <[email protected]>

* Set providers for ORT inference session (#5903)

Signed-off-by: athitten <[email protected]>
Signed-off-by: Jason <[email protected]>

* [ASR] Configurable metrics for audio-to-audio + removed experimental decorators (#5827)

* Added an option to configure metrics for audio-to-audio models
Removed experimental decorators

Signed-off-by: Ante Jukić <[email protected]>

* Addressed review comments

Signed-off-by: Ante Jukić <[email protected]>

---------

Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: Jason <[email protected]>

* Correct doc for RNNT transcribe() function (#5904)

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Jason <[email protected]>

* Add segmentation export to Audacity label file (#5857)

* Save the segmentation as label file for Audacity

Audacity is a free open source audio editor that can import label file to quickly assess the segmentation quality. This commit add the export to [Audacity label format](https://manual.audacityteam.org/man/importing_and_exporting_labels.html) so that directly after running the segmentation tool the segmentation quality can be assessed or the segmentation can be shared easily.

Signed-off-by: CaraDuf <[email protected]>

* Fix styling

Signed-off-by: CaraDuf <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused score in audacity export

score is not written in audacity label file so we can safely not load it from segment.

Signed-off-by: CaraDuf <[email protected]>

---------

Signed-off-by: CaraDuf <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <[email protected]>

* Cross-Lingual objectives (XLM) and multilingual (many-many) support for Megatron-NMT (#5026)

* Update blendable dataset, and refactor seq2seq data

Signed-off-by: MaximumEntropy <[email protected]>

* Blendable dataset with binarized mmap working

Signed-off-by: MaximumEntropy <[email protected]>

* Pass seed from cfg to dataset

Signed-off-by: MaximumEntropy <[email protected]>

* Fix multilingual setup

Signed-off-by: MaximumEntropy <[email protected]>

* Add on epoch start reconfiguration

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Update tokenizer creation for multilingual

Signed-off-by: MaximumEntropy <[email protected]>

* Tmp

Signed-off-by: MaximumEntropy <[email protected]>

* Update NMT script

Signed-off-by: MaximumEntropy <[email protected]>

* Remove unused import

Signed-off-by: MaximumEntropy <[email protected]>

* Update training script

Signed-off-by: MaximumEntropy <[email protected]>

* Log consumed samples

Signed-off-by: MaximumEntropy <[email protected]>

* Logging on val epoch end

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Remove redundant print

Signed-off-by: MaximumEntropy <[email protected]>

* Ckpt averaging for non model parallel megatron models

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Empty

Signed-off-by: MaximumEntropy <[email protected]>

* Update error message

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Remove check

Signed-off-by: MaximumEntropy <[email protected]>

* Restore fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Remove ipdb

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Move to classmethods

Signed-off-by: MaximumEntropy <[email protected]>

* Initial

Signed-off-by: MaximumEntropy <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* Refactor masking to add skip_masking_id and working xlm bert and t5 datasets

Signed-off-by: MaximumEntropy <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Testing a simple solution

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed. Seems to work. Need to validate.

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in CSV and text memmap toMEgatron encoder-decoder

Signed-off-by: Micha Livne <[email protected]>

* 1. Added support in CSV.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.
2. Fixed bugs.

Signed-off-by: Micha Livne <[email protected]>

* 1. Debugging.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed bugs.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Updated yaml.

Signed-off-by: Micha Livne <[email protected]>

* Minor

Signed-off-by: MaximumEntropy <[email protected]>

* 1. Fixed warnings.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed a bug.

Signed-off-by: Micha Livne <[email protected]>

* Tmp

Signed-off-by: MaximumEntropy <[email protected]>

* Updates

Signed-off-by: MaximumEntropy <[email protected]>

* Fix minor data things

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Lang ids for validation datasets

Signed-off-by: MaximumEntropy <[email protected]>

* More fixes for lang id code at inference

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Remove pdb

Signed-off-by: MaximumEntropy <[email protected]>

* Fix prepend ID and bleu logging

Signed-off-by: MaximumEntropy <[email protected]>

* Refactor

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes for many-many NMT

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Reset o2 default

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Restore dataset utils

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Allreduce bleu scores

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* 1. Loading index file into memmap object.

Signed-off-by: Micha Livne <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* 1. Fixed style.

Signed-off-by: Micha Livne <[email protected]>

* 1. Fixed extentin when loading files.

Signed-off-by: Micha Livne <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fix redundant building

Signed-off-by: MaximumEntropy <[email protected]>

* PP > 2 for NMT

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Style

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Merge and fix

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Refactor multilingual again

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor and verify data formats

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup

Signed-off-by: MaximumEntropy <[email protected]>

* more fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Fix passing langs

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes

Signed-off-by: MaximumEntropy <[email protected]>

* More fixes

Signed-off-by: MaximumEntropy <[email protected]>

* Fixes for bart

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <[email protected]>
Signed-off-by: Jason <[email protected]>

* ONNX export working

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fixing unit test

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Update isort to the latest version (#5895)

Update isort to the latest version

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Pin isort version (#5914)

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Moved eval notebook data to aws (#5911)

Signed-off-by: Jocelyn Huang <[email protected]>
Signed-off-by: Jason <[email protected]>

* FilterbankFeaturesTA to match FilterbankFeatures (#5913)

Signed-off-by: Mohamed Saad Ibn Seddik <[email protected]>
Signed-off-by: Jason <[email protected]>

* fixed missing long_description_content_type (#5909)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Jason <[email protected]>

* added TPMLP for T5-based models (#5840) (#5841)

Signed-off-by: David Mosallanezhad <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fixing 0-size issue and ONNX BS>1 trace

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fixing code scan alert

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* update container (#5917)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Jason <[email protected]>

* remove conda pynini install (#5921)

Signed-off-by: ekmb <[email protected]>
Signed-off-by: Jason <[email protected]>

* Merge release main (#5916)

* update branch

Signed-off-by: ericharper <[email protected]>

* added TPMLP for T5-based models (#5840)

Signed-off-by: David Mosallanezhad <[email protected]>

Signed-off-by: David Mosallanezhad <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>

* remove notebook (#5859)

Signed-off-by: ericharper <[email protected]>

Signed-off-by: ericharper <[email protected]>

* update branch

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: David Mosallanezhad <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Signed-off-by: Jason <[email protected]>

* Dynamic freezing in Nemo (#5879)

* Initial commit for dynamic freezing logic

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated logic to handle lists and updated docs

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Transferred dynamic freezing logic to core from asr

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert asr config to original

Signed-off-by: Daniel Egert <[email protected]>

* Fixed tab indent in core.rst

Signed-off-by: Daniel Egert <[email protected]>

* Updated modelPT for latest from master

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed indents in docs

Signed-off-by: Daniel Egert <[email protected]>

---------

Signed-off-by: Daniel Egert <[email protected]>
Co-authored-by: Daniel Egert <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <[email protected]>

* Fix Windows bug with save_restore_connector (#5919)

* Initial commit for Windows bug with save_to

Signed-off-by: Daniel Egert <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Daniel Egert <[email protected]>
Co-authored-by: Daniel Egert <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jason <[email protected]>

* add new lannguages to doc (#5939)

Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Jason <[email protected]>

* Workarounds for ONNX export with autocast

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* fix val loss computation in megatron (#5871)

* fix val loss computation in megatron

* Fix NaN handling during validation

---------

Co-authored-by: ANMOL GUPTA <[email protected]>
Co-authored-by: Mikołaj Błaż <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Jason <[email protected]>

* Restoring sigmas

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Add core classes and functions for online clustering diarizer part 2 (#5609)

* Add core classes and functions for online clustering diarizer

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add audio to labels code

Signed-off-by: Taejin Park <[email protected]>

* resolve type errors

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added unit=tests for very short audio

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Filled all missing docstrings

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved conflict and added missing docstrings

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed unit-test errors

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix the wrongly added file - megatron_gpt_model.py

Signed-off-by: Taejin Park <[email protected]>

* Fix wrongly included file - megatron_gpt_model.py

Signed-off-by: Taejin Park <[email protected]>

* resolve code quality issue

Signed-off-by: Taejin Park <[email protected]>

* Fixed unit-test errors and bugs

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* changed total_sec for offline_clustering toy_data in unit-tests

Signed-off-by: Taejin Park <[email protected]>

* fixed merging index offset bug

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* only including part 1 files

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed unused function

Signed-off-by: Taejin Park <[email protected]>

* fixed unused imports

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* divided nmesc_clustering.py into two and reflected first-pass comments

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding offline/online_clustering.py

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code QL autocomment

Signed-off-by: Taejin Park <[email protected]>

* Removed unused imports

Signed-off-by: Taejin Park <[email protected]>

* Update nemo/collections/asr/parts/utils/online_clustering.py

Co-authored-by: Sean Naren <[email protected]>
Signed-off-by: Taejin Park <[email protected]>

* Reflected comments

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* resolved code scanning issue

Signed-off-by: Taejin Park <[email protected]>

* Adding online_diarizer.py

Signed-off-by: Taejin Park <[email protected]>

* updated tests and speaker_utils

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed the wrong test eval

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updating online diarizer for varialbe name change

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Reflected comments and some typo fixes in speaker_utils

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: Sean Naren <[email protected]>
Signed-off-by: Jason <[email protected]>

* Distributed Adam optimizer overlaps param all-gather with forward compute (#5684)

* Add distopt support for overlapping param all-gather with forward compute

Signed-off-by: Tim Moon <[email protected]>

* Update Apex commit

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Signed-off-by: Jason <[email protected]>

* [TTS][ZH] added new NGC model cards with polyphone disambiguation. (#5940)

* [TTS][ZH] added new NGC model cards with polyphone disambiguation.

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Jason <[email protected]>

* Moved truncation of context higher up

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* [TN] bugfix file handler is not closed. (#5955)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Jason <[email protected]>

* Added unit test for regulate_len. Unscripted sort_tensor for TRT

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fixed slice

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* [TTS] deprecate AudioToCharWithPriorAndPitchDataset. (#5959)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Jason <[email protected]>

* bugfix: file handlers are not closed. (#5956)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Jason <[email protected]>

* [TTS][G2P] deprecate add_symbols (#5961)

Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: Jason <[email protected]>

* fix broken link (#5968)

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fix hybridasr bug (#5950) (#5957)

Signed-off-by: Jason <[email protected]>

* Added list_available_models (#5967)

* Added list_available_models

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Added to readme

Signed-off-by: Evgeniy Shabalin <[email protected]>

* added vits to docs

Signed-off-by: Evgeniy Shabalin <[email protected]>

* added vits to docs

Signed-off-by: Evgeniy Shabalin <[email protected]>

---------

Signed-off-by: Evgeniy Shabalin <[email protected]>
Signed-off-by: Evgeniy Shabalin <[email protected]>
Signed-off-by: Jason <[email protected]>

* Move settings to `pyproject.toml`. Remove deprecated `pytest-runner` (#5947)

* Move project settings to pyproject.toml

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove setup.cfg

Signed-off-by: Vladimir Bataev <[email protected]>

* Remove deprecated pytest-runner

Signed-off-by: Vladimir Bataev <[email protected]>

* Add comments

Signed-off-by: Vladimir Bataev <[email protected]>

* Allow only registered markers for pytest

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Fix torchaudio installation (#5850)

* Fail if torchaudio not installed

Signed-off-by: Vladimir Bataev <[email protected]>

* Fix torchaudio matching version

Signed-off-by: Vladimir Bataev <[email protected]>

* Warn if Pytorch major version changed

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jason <[email protected]>

* Update fastpitch.py (#5969)

Signed-off-by: Jason <[email protected]>

* Review comments

Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: Jason <[email protected]>

* per-micro-batch input loader (#5635)

* per-micro-batch input loader

* per-micro-batch input loader

set arg default val

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fix

* apply per-microbatch-loader to only GPT

* update docstring on micro-batch input loader

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed the default arg val

* fix batch size to 1 at log stat registration

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update container for CI

Signed-off-by: ericharper <[email protected]>

* update container in jenkinsfile

Signed-off-by: ericharper <[email protected]>

* update container for CI

Signed-off-by: ericharper <[email protected]>

fix merge conflict

* revert Jenkinsfile

* Revert "revert Jenkinsfile"

This reverts commit d23b7757e0f935dacde2840f234193c632a2b3be.

* Update nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py

Signed-off-by: Tim Moon <[email protected]>

* add GradScaler

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: ericharper <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ericharper <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Jason <[email protected]>

* update container in readme (#5981)

Signed-off-by: fayejf <[email protected]>
Signed-off-by: Jason <[email protected]>

* Support Alignment Extraction for all RNNT Beam decoding methods (#5925)

* Partial impl of ALSD alignment extraction

Signed-off-by: smajumdar <[email protected]>

* Partial impl of ALSD alignment extraction

Signed-off-by: smajumdar <[email protected]>

* Remove everything else

Signed-off-by: smajumdar <[email protected]>

* Support dataclass in AbstractRNNTDecoding

Signed-off-by: smajumdar <[email protected]>

* Add first draft unittest

Signed-off-by: smajumdar <[email protected]>

* Correct the logic to more to the next timestep in the alignment

Signed-off-by: smajumdar <[email protected]>

* Finalize ALSD alignment generation

Signed-off-by: smajumdar <[email protected]>

* Add support for TSD greedy alignment extraction

Signed-off-by: smajumdar <[email protected]>

* Add support for mAES greedy alignment extraction

Signed-off-by: smajumdar <[email protected]>

* Finalize extraction of alignments from all beam algorithms for RNNT

Signed-off-by: smajumdar <[email protected]>

* Style fixes

Signed-off-by: smajumdar <[email protected]>

* Add copyright

Signed-off-by: smajumdar <[email protected]>

* Address comments

Signed-off-by: smajumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Jason <[email protected]>

* Add AWS SageMaker ASR Examples (#5638)

* Base code for AWS SageMaker example

Signed-off-by: SeanNaren <[email protected]>

* Remove format

Signed-off-by: SeanNaren <[email protected]>

* wrap

Signed-off-by: SeanNaren <[email protected]>

* Add a notebook with the code

Signed-off-by: SeanNaren <[email protected]>

* Setup

Signed-off-by: SeanNaren <[email protected]>

* Update notebook

Signed-off-by: SeanNaren <[email protected]>

* Remove space

Signed-off-by: SeanNaren <[email protected]>

* Fix spelling mistake

Signed-off-by: SeanNaren <[email protected]>

* Add message to explain usage

Signed-off-by: SeanNaren <[email protected]>

* Add CommonVoice esperanto example

Signed-off-by: SeanNaren <[email protected]>

* Fix path

Signed-off-by: SeanNaren <[email protected]>

* Fixes

Signed-off-by: SeanNaren <[email protected]>

* Import sox locally, add documentation

Signed-off-by: SeanNaren <[email protected]>

* Address reviews

Signed-off-by: SeanNaren <[email protected]>

* Address reviews

Signed-off-by: SeanNaren <[email protected]>

* Address reviews

Signed-off-by: SeanNaren <[email protected]>

* Add cell to download the SSL model

Signed-off-by: SeanNaren <[email protected]>

* Set max epochs to 300

Signed-off-by: SeanNaren <[email protected]>

* Fixes, introduce HF dataset instructions

Signed-off-by: SeanNaren <[email protected]>

* Upstream updates from other branch

Signed-off-by: SeanNaren <[email protected]>

* Fix warning

Signed-off-by: SeanNaren <[email protected]>

* Add README, add image

Signed-off-by: SeanNaren <[email protected]>

* Fix warning

Signed-off-by: SeanNaren <[email protected]>

* Address feedback

Signed-off-by: SeanNaren <[email protected]>

* Feedback

Signed-off-by: SeanNaren <[email protected]>

---------

Signed-off-by: SeanNaren <[email protected]>
Signed-off-by: Jason <[email protected]>

* Update PUBLICATIONS.md (#5963)

* Add papers from 2022/2022 to PUBLICATIONS.md

Signed-off-by: smajumdar <[email protected]>

* Remove ipynb from being tracked as for nemo code library

Signed-off-by: smajumdar <[email protected]>

* Remove ipynb from being tracked as for nemo code library

Signed-off-by: smajumdar <[email protected]>

* Add additional papers

Signed-off-by: smajumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
Signed-off-by: Jason <[email protected]>

* [G2P] fixed typos and broken import library. (#5978) (#5979)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Jason <[email protected]>

* [G2P] added backward compatibility for english tokenizer and fixed unit tests (#5980) (#5984)

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Signed-off-by: Jason <[email protected]>

---------

Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Jason <[email protected]>
Signed-off-by: Matvei Novikov <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: fayejf <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Signed-off-by: Yi Dong <[email protected]>
Signed-off-by: Roman Korostik <[email protected]>
Signed-off-by: Roman Korostik <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Jean-Louis Queguiner <[email protected]>
Signed-off-by: ekmb <[email protected]>
Signed-off-by: Vladimir Bataev <[email protected]>
Signed-off-by: Jocelyn Huang <[email protected]>
Signed-off-by: SeanNaren <[email protected]>
Signed-off-by: gabitza-tech <[email protected]>
Signed-off-by: ericharper <[email protected]>
Signed-off-by: athitten <[email protected]>
Signed-off-by: Ante Jukić <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Signed-off-by: CaraDuf <[email protected]>
Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Micha Livne <[email protected]>
Signed-off-by: Mohamed Saad Ibn Seddik <[email protected]>
Signed-off-by: Xuesong Yang <[email protected]>
Signed-off-by: David Mosallanezhad <[email protected]>
Signed-off-by: Daniel Egert <[email protected]>
Signed-off-by: Yang Zhang <[email protected]>
Signed-off-by: Evgeniy Shabalin <[email protected]>
Signed-off-by: Evgeniy Shabalin <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Matvei Novikov <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: fayejf <[email protected]>
Co-authored-by: He Huang (Steve) <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: Yi Dong <[email protected]>
Co-authored-by: Roman Korostik <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Jean-Louis Queguiner <[email protected]>
Co-authored-by: Evelina <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: Vladimir Bataev <[email protected]>
Co-authored-by: Mikyas Desta <[email protected]>
Co-authored-by: Jocelyn <[email protected]>
Co-authored-by: Sean Naren <[email protected]>
Co-authored-by: Gabriel Pirlogeanu <[email protected]>
Co-authored-by: anmolgupt <[email protected]>
Co-authored-by: ANMOL GUPTA <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
Co-authored-by: Zhilin Wang <[email protected]>
Co-authored-by: athitten <[email protected]>
Co-authored-by: anteju <[email protected]>
Co-authored-by: Somshubra Majumdar <[email protected]>
Co-authored-by: CaraDuf <[email protected]>
Co-authored-by: Sandeep Subramanian <[email protected]>
Co-authored-by: Micha Livne <[email protected]>
Co-authored-by: Mohamed Saad Ibn Seddik <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Co-authored-by: David <[email protected]>
Co-authored-by: David Mosallanezhad <[email protected]>
Co-authored-by: trias702 <[email protected]>
Co-authored-by: Daniel Egert <[email protected]>
Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: Mikołaj Błaż <[email protected]>
Co-authored-by: Evgeniy Shabalin <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
* fast conformer configs and doc



* feedback



* adding fast conformer to main README



* path changes



* rewording



* further doc changes



* naming

---------

Signed-off-by: Dima Rekesh <[email protected]>
Co-authored-by: Dima Rekesh <[email protected]>
Co-authored-by: Dima Rekesh <[email protected]>
…ulator (#5897)

* fix silence insertioon

Signed-off-by: stevehuang52 <[email protected]>

* update docs and tutorial

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* change to beta annd gamma distributions

Signed-off-by: stevehuang52 <[email protected]>

* update

Signed-off-by: stevehuang52 <[email protected]>

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* Added silence vs overlap selector with overlap algo

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Function name change and fixes

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update silence and overlap adding algorithm for better accuracy

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Recommended range for overlap mean

Signed-off-by: Taejin Park <[email protected]>

* Changing yaml file default values

Signed-off-by: Taejin Park <[email protected]>

* Fixed typos and errors in docstrings

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed minor bugs and removed unused functions

Signed-off-by: Taejin Park <[email protected]>

* Fixed minor bugs and removed unused imports

Signed-off-by: Taejin Park <[email protected]>

* Added docstrings for newly updated overlap algos

Signed-off-by: Taejin Park <[email protected]>

* Fixed non_silence_len_samples calculation, more accurate now

Signed-off-by: Taejin Park <[email protected]>

* adding missing docstring for non_silence_len

Signed-off-by: Taejin Park <[email protected]>

* removed ipdb lines

Signed-off-by: Taejin Park <[email protected]>

* refactor and update

Signed-off-by: stevehuang52 <[email protected]>

* updated logs for v1.1

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Argument check update for mean=0 var=0 case

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix typo

Signed-off-by: stevehuang52 <[email protected]>

* update silence/overlap mean clipping

Signed-off-by: stevehuang52 <[email protected]>

* Adding mean clipping

Signed-off-by: Taejin Park <[email protected]>

* added 0 handling for ovl/sim_mean

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Tested on fisher and fixed the bug with string-speaker ID

Signed-off-by: Taejin Park <[email protected]>

* update code for visualization

Signed-off-by: stevehuang52 <[email protected]>

* refactor

Signed-off-by: stevehuang52 <[email protected]>

* fix load_rttm

Signed-off-by: stevehuang52 <[email protected]>

* Adding docstrings

Signed-off-by: Taejin Park <[email protected]>

* Adding usage in the analysis script

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix filename

Signed-off-by: stevehuang52 <[email protected]>

* Added argument check for sentence length params

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed unnecessary NB torch sampling

Signed-off-by: Taejin Park <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add build_synthetic_vad_manifest.py

Signed-off-by: stevehuang52 <[email protected]>

* add check for non rttm files

Signed-off-by: stevehuang52 <[email protected]>

* added docstrings

Signed-off-by: Taejin Park <[email protected]>

* typo is fixed

Signed-off-by: Taejin Park <[email protected]>

* License template was missing, added

Signed-off-by: Taejin Park <[email protected]>

* add missing copyright and move script

Signed-off-by: stevehuang52 <[email protected]>

* add missing comma

Signed-off-by: stevehuang52 <[email protected]>

---------

Signed-off-by: stevehuang52 <[email protected]>
Signed-off-by: Taejin Park <[email protected]>
Co-authored-by: Taejin Park <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Artem Zemliak <[email protected]>
Co-authored-by: Artem Zemliak <[email protected]>
* retrieval service seperation

Signed-off-by: Yi Dong <[email protected]>

* refactor service code

Signed-off-by: Yi Dong <[email protected]>

* fix name

Signed-off-by: Yi Dong <[email protected]>

* add combo server

Signed-off-by: Yi Dong <[email protected]>

* added combo files

Signed-off-by: Yi Dong <[email protected]>

* fix the bug

Signed-off-by: Yi Dong <[email protected]>

* add retrieval service

Signed-off-by: Yi Dong <[email protected]>

* fix updatable flag

Signed-off-by: Yi Dong <[email protected]>

* working example

Signed-off-by: Yi Dong <[email protected]>

* seperate text generation server

Signed-off-by: Yi Dong <[email protected]>

* added webserver

Signed-off-by: Yi Dong <[email protected]>

* clean up and fix zero neighbor issue

Signed-off-by: Yi Dong <[email protected]>

* fix the style

Signed-off-by: Yi Dong <[email protected]>

* add license

Signed-off-by: Yi Dong <[email protected]>

* fixed code QL

Signed-off-by: Yi Dong <[email protected]>

* added bash script to launch the demo

Signed-off-by: Yi Dong <[email protected]>

* clean up

Signed-off-by: Yi Dong <[email protected]>

---------

Signed-off-by: Yi Dong <[email protected]>
* Use module-based k2 import guard

Signed-off-by: Vladimir Bataev <[email protected]>

---------

Signed-off-by: Vladimir Bataev <[email protected]>
* storing

* Added VITS documentation

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Added VITS documentation

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Cleaned stuff

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Cleaned stuff

Signed-off-by: Evgeniy Shabalin <[email protected]>

* cleaning

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Typos

Signed-off-by: Evgeniy Shabalin <[email protected]>

* Added experimental note

Signed-off-by: Evgeniy Shabalin <[email protected]>

---------

Signed-off-by: Evgeniy Shabalin <[email protected]>
Signed-off-by: Abhinav Khattar <[email protected]>
Co-authored-by: Abhinav Khattar <[email protected]>
Signed-off-by: smajumdar <[email protected]>
Co-authored-by: Eric Harper <[email protected]>
* Added documentation section for ASR datasets from AIStore

Signed-off-by: Ante Jukić <[email protected]>

* Address review comments

Signed-off-by: Ante Jukić <[email protected]>

---------

Signed-off-by: Ante Jukić <[email protected]>
commit b31f117
Author: Boris Fomitchev <[email protected]>
Date:   Tue Feb 14 15:12:30 2023 -0800

    TJ hacks

    Signed-off-by: Boris Fomitchev <[email protected]>

commit 7caae20
Author: Boris Fomitchev <[email protected]>
Date:   Tue Feb 14 10:06:04 2023 -0800

    Ragged batching changes for RadTTS, some refactoring

    Signed-off-by: Boris Fomitchev <[email protected]>

Signed-off-by: Boris Fomitchev <[email protected]>
Co-authored-by: Jason <[email protected]>
* quick fix

Signed-off-by: Jason <[email protected]>

* undo

Signed-off-by: Jason <[email protected]>

---------

Signed-off-by: Jason <[email protected]>
* GPT no longer explicitly overlaps distopt communication with forward compute

Signed-off-by: Tim Moon <[email protected]>

* Remove unused import

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
ekmb and others added 12 commits February 15, 2023 14:22
* remove TN

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix imports

Signed-off-by: ekmb <[email protected]>

* fix import

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing init

Signed-off-by: ekmb <[email protected]>

* fix import

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename unit test

Signed-off-by: ekmb <[email protected]>

* fix import

Signed-off-by: ekmb <[email protected]>

* fix modules test

Signed-off-by: ekmb <[email protected]>

* fix imports

Signed-off-by: ekmb <[email protected]>

* remove whitelist from config

Signed-off-by: ekmb <[email protected]>

* delete wordid file

Signed-off-by: ekmb <[email protected]>

* remove pynini_install from tutorials

Signed-off-by: ekmb <[email protected]>

* update requirements

Signed-off-by: ekmb <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support warning

Signed-off-by: ekmb <[email protected]>

* review

Signed-off-by: ekmb <[email protected]>

---------

Signed-off-by: ekmb <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* patch to allow using tokenizers without additional_special_tokens_ids attribute

Signed-off-by: arendu <[email protected]>

* early stop callback for prompt/p tuning

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* added exp manager config for early stop

Signed-off-by: arendu <[email protected]>

* pushed logic for creating early stopping inside exp manager

Signed-off-by: arendu <[email protected]>

* pushed logic for creating early stopping inside exp manager

Signed-off-by: arendu <[email protected]>

* minor updates and added dataclass check

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more args

Signed-off-by: arendu <[email protected]>

* more args

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Tn doc 16 (#5954)
* fix new repo links

Signed-off-by: Yang Zhang <[email protected]>
Add model.eval() to ensure the accuracy.

Signed-off-by: Slyne Deng <[email protected]>
Co-authored-by: Slyne Deng <[email protected]>
* add random seed in perturb

Signed-off-by: fayejf <[email protected]>

* small update

Signed-off-by: fayejf <[email protected]>

* update evaluator config

Signed-off-by: fayejf <[email protected]>

* update tutorial

Signed-off-by: fayejf <[email protected]>

* update add_noise

Signed-off-by: fayejf <[email protected]>

---------

Signed-off-by: fayejf <[email protected]>
* Add Customization Dataset Preparation Tool

Allows users to read data into prompt-and-completion format .jsonl as expected by the Customization service/NeMo LLM P tuning service

Signed-off-by: Zhilin Wang [email protected]

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add license and usage examples, remove tutorial

Signed-off-by: Zhilin Wang [email protected]

* Fix typo

Signed-off-by: Zhilin Wang [email protected]

* Fix some more typos

---------

Signed-off-by: Zhilin Wang [email protected]
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
* Some simplifications

Signed-off-by: Igor Gitman <[email protected]>

* Add tests for stochastic depth

Signed-off-by: Igor Gitman <[email protected]>

* Fix tests for stochastic depth

Signed-off-by: Igor Gitman <[email protected]>

* Add interctc loss and logs

Signed-off-by: Igor Gitman <[email protected]>

* Fix a few issues

Signed-off-by: Igor Gitman <[email protected]>

* Add interctc loss tests

Signed-off-by: Igor Gitman <[email protected]>

* Add docs

Signed-off-by: Igor Gitman <[email protected]>

* Add training_step test for interctc

Signed-off-by: Igor Gitman <[email protected]>

* Refactoring with AccessMixin WIP

Signed-off-by: Igor Gitman <[email protected]>

* Separate interctc logic into a mixin

Signed-off-by: Igor Gitman <[email protected]>

* Fix tests

Signed-off-by: Igor Gitman <[email protected]>

* Fix some lint errors

Signed-off-by: Igor Gitman <[email protected]>

* Small refactoring

Signed-off-by: Igor Gitman <[email protected]>

* Add more docs, fix PR comments

Signed-off-by: Igor Gitman <[email protected]>

* Add other encoder support + more refactoring

Signed-off-by: Igor Gitman <[email protected]>

* Add more config examples

Signed-off-by: Igor Gitman <[email protected]>

* Move stochastic depth setup to utils

Signed-off-by: Igor Gitman <[email protected]>

* Add interctc_enabled setter + more docs

Signed-off-by: Igor Gitman <[email protected]>

* Fix a few doc strings for better web display

Signed-off-by: Igor Gitman <[email protected]>

* Update CTC flow diagram

Signed-off-by: Igor Gitman <[email protected]>

---------

Signed-off-by: Igor Gitman <[email protected]>
* Add pyctcdecode to high level beam search API

Signed-off-by: smajumdar <[email protected]>

* Remove redundant assignment

Signed-off-by: smajumdar <[email protected]>

---------

Signed-off-by: smajumdar <[email protected]>
* Initial

Signed-off-by: MaximumEntropy <[email protected]>

* Multiple fixes

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add to CI test

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* check position embs for gpt prompt learning

Signed-off-by: Adi Renduchintala <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update args

Signed-off-by: MaximumEntropy <[email protected]>

* Disable tts unit test

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Fix

Signed-off-by: MaximumEntropy <[email protected]>

* Empty

Signed-off-by: MaximumEntropy <[email protected]>

* Update Jenkinsfile

Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'.

Signed-off-by: khcs <[email protected]>

* update config to to use correct key

Signed-off-by: ericharper <[email protected]>

* revert Jenkinsfile back to fused_adam

Signed-off-by: ericharper <[email protected]>

---------

Signed-off-by: MaximumEntropy <[email protected]>
Signed-off-by: Adi Renduchintala <[email protected]>
Signed-off-by: khcs <[email protected]>
Signed-off-by: ericharper <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <[email protected]>
Co-authored-by: khcs <[email protected]>
Co-authored-by: Oleksii Kuchaiev <[email protected]>
Co-authored-by: ericharper <[email protected]>
* patch to allow using tokenizers without additional_special_tokens_ids attribute

Signed-off-by: arendu <[email protected]>

* early stop callback for prompt/p tuning

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <[email protected]>

* added exp manager config for early stop

Signed-off-by: arendu <[email protected]>

* pushed logic for creating early stopping inside exp manager

Signed-off-by: arendu <[email protected]>

* pushed logic for creating early stopping inside exp manager

Signed-off-by: arendu <[email protected]>

* minor updates and added dataclass check

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more args

Signed-off-by: arendu <[email protected]>

* more args

Signed-off-by: arendu <[email protected]>

* wrap tpmlp inside prompt encoder

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* updates removed unused imports

Signed-off-by: arendu <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removes typecheck for tpmlp module

Signed-off-by: arendu <[email protected]>

---------

Signed-off-by: arendu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@borisfom borisfom closed this Feb 19, 2023
messiaen pushed a commit that referenced this pull request Mar 7, 2023
Signed-off-by: Boris Fomitchev <[email protected]>
messiaen added a commit that referenced this pull request Mar 14, 2023
* cache-aware streaming export

Test onnx streaming conformer ctc WER

Constant att cache width with len param

Remove some extra functions in cache_aware runner

transpose cache so that batch is first for trt

Signed-off-by: Greg Clark <[email protected]>

* fix export for full-context conformer

* WIP trying to improve onnx perf

Signed-off-by: Greg Clark <[email protected]>

* Adding test scripts

Signed-off-by: Greg Clark <[email protected]>

* More perf testing script

Signed-off-by: Greg Clark <[email protected]>

* Updates for jit torch_tensorrt tracing

Signed-off-by: Greg Clark <[email protected]>

* Fixed trace warnings

Signed-off-by: Boris Fomitchev <[email protected]>

* Rearranging tests

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixing non-caching case

Signed-off-by: Boris Fomitchev <[email protected]>

* testing

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed channel cache length issue

Signed-off-by: Boris Fomitchev <[email protected]>

* cache-aware streaming export

Test onnx streaming conformer ctc WER

Constant att cache width with len param

Remove some extra functions in cache_aware runner

transpose cache so that batch is first for trt

Signed-off-by: Greg Clark <[email protected]>

* fix export for full-context conformer

* WIP trying to improve onnx perf

Signed-off-by: Greg Clark <[email protected]>

* Adding test scripts

Signed-off-by: Greg Clark <[email protected]>

* More perf testing script

Signed-off-by: Greg Clark <[email protected]>

* Updates for jit torch_tensorrt tracing

Signed-off-by: Greg Clark <[email protected]>

* stash

Signed-off-by: Boris Fomitchev <[email protected]>

* Reverting non-essential changes

Signed-off-by: Boris Fomitchev <[email protected]>

* Offset=None case

Signed-off-by: Boris Fomitchev <[email protected]>

* Remove test scripts

Signed-off-by: Greg Clark <[email protected]>

* Clean up speech_to_text_cache_aware_streaming_infer

Signed-off-by: Greg Clark <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert pad -> constant_pad_nd

Signed-off-by: Greg Clark <[email protected]>

* conformer-encoder set window_size from streaming_cfg

Signed-off-by: Greg Clark <[email protected]>

* Fixes for working export(), using more constants

Signed-off-by: Boris Fomitchev <[email protected]>

* Optional rand init for cahce

Signed-off-by: Greg Clark <[email protected]>

* Folding update_cache with constants

Signed-off-by: Boris Fomitchev <[email protected]>

* More folding

Signed-off-by: Boris Fomitchev <[email protected]>

* Reducing diff #1

Signed-off-by: Boris Fomitchev <[email protected]>

* Reducing diff #2

Signed-off-by: Boris Fomitchev <[email protected]>

* Reducing diff #3

Signed-off-by: Boris Fomitchev <[email protected]>

* Fixed unit tests, more reverts

Signed-off-by: Boris Fomitchev <[email protected]>

* Export fixes

Signed-off-by: Boris Fomitchev <[email protected]>

* Reverted slice changes that ruined ONNX perf

Signed-off-by: Boris Fomitchev <[email protected]>

* Adding back keep_all_outputs and drop_extra_preencoded

Signed-off-by: Greg Clark <[email protected]>

* Fix export

Signed-off-by: Greg Clark <[email protected]>

---------

Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Boris Fomitchev <[email protected]>
Co-authored-by: Boris Fomitchev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Vahid Noroozi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.